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Abstract: The Bethe approximation, or loopy belief propagation algorithm, is a successful 
method for approximating partition functions of probabilistic models associated with graphs. 
Chertkov and Chernyak derived an interesting formula called "Loop Series Expansion" , which is 
an expansion of the partition function. The main term of the series is the Bethe approximation 
while the other terms are labeled by subgraphs called generalized loops. 

In our recent paper, we derive the loop series expansion in form of a polynomial with coefficients 
positive integers, and extend the result to the expansion of marginals. In this paper, we give more 
clear derivation of the results and discuss the properties of newly introduced polynomials. 



1 Introduction 

A Markov random field (MRF) associated with a graph 
is given by a joint probability distribution over a set 
O ■ of variables. In the associated graph, the nodes rep- 
resent variables and the edges represent probabilistic 
dependence between variables. The joint distribution 
is often given in an unnormalized form, and the nor- 
' malization factor of a MRF is called a partition func- 
£NJ \ tion. 

I/"") ■ Computation of the partition function and the mar- 

ginal distributions of a MRF with discrete variables 
, is in general computationally intractable for a large 
number of variables, and some approximation method 
is required. Among many approximation methods, 
the Bethe approximation has attracted renewed in- 
r* ■ terest of computer scientists; it is equivalent to Loopy 

. ' Belief Propagation (LBP) algorithm [2j[3], which has 
been successfully used for many applications such as 
error correcting codes, inference on graphs, image pro- 
cessing, and so on [H [51 [5] . If the associated graph is 
a tree, the algorithm computes the exact value, not 
an approximation [SJ. 

The performance of this approximation is surpris- 
ingly well for many applications even if the graph has 
many cycles. If the graph has one cycle, the behavior 
of the algorithm is well understood, and maximum 
marginal assignment of the approximation is known 
to be exact [10] . On the other hand, if the graph has 
many cycles, little analysis have done in the connec- 
tion with the topological structure of the underlying 
graph. Theoretical analysis of the approximation is 
important both for improving the algorithm and ex- 
tending it to wide range of applications. 

Chertkov and Chernyak [8] give a formula called 
loop series expansion, which expresses the partition 
function in terms of a finite series. The first term is 
the Bethe approximation, and the others are labeled 
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by so-called generalized loops. The Bethe approxima- 
tion can be corrected with this formula. This expan- 
sion highlights the connection between the accuracy 
of the approximation and the topology of the graph. 

In our recent paper pQ, we derive the formula in 
terms of message passing scheme with diagrams based 
on the method called propagation diagrams. We also 
showed that the true marginals can be also expanded 
around the approximated marginals. 

In this paper we give a simple and easy deriva- 
tion of the expansion of the partition function and 
the marginal distributions. Similar but a different ap- 
proach is found in [11] . We also discuss the properties 
of the bivariate polynomial which is introduced in [T] . 
Since this polynomial represents the ratio of the true 
partition function and the Bethe approximated parti- 
tion function, investigation of it is important for un- 
derstanding the relation between the graph topology 
and the approximation performance. 

2 Bethe approximation and LBP 
algorithm 

In this section, we review the definitions and nota- 
tions of Markov Random Field (MRF) and loopy be- 
lief propagation algorithm. 

2.1 Pairwise Markov random field 

We introduce a probabilistic model considered in this 
paper, MRF of binary states with pairwise interac- 
tions. Let G := (V,E) be a connected undirected 
graph, where V = {1, . . . , N} is a set of nodes and 
E C < i < j < N} is a set of undirected 

edges. We abbreviate undirected edges to ij = 
ji. Each node i £ V is associated with a binary space 
Xi = {±1}- We make a set of directed edges from E: 
E = £ E}. The neighbors of i is 

denoted by N(i) C V, and di — \N(i)\ is called the 



degree of i. A joint probability distribution on the 
graph G is given by the form: 



P( x ) = 4 n ^ifa'^j) II <Pi( x i)i 



(1) 



ieV 



where tpi^x^Xj) : %i x Xj R >o and 4 : Xi ~> 
M>o are positive functions called compatibility func- 
tions. The normalization factor Z is called the parti- 
tion function. A set of random variables which has a 
probability distribution in the form of ((T|) is called a 
Markov random field (MRF) or an undirected graph- 
ical model on the graph G. 

Without loss of generality, univariate compatibil- 
ity functions (f>i can be neglected because they can 
be included in bivariate compatibility functions ipij. 
This operation does not affect the Bethe approxima- 
tion and the LBP algorithm given below; we assume 
it as per the following. 

2.2 Loopy belief propagation algorithm 

The LBP algorithm computes the Bethe approxima- 
tion of the partition function and the marginal distri- 
bution of each node with the message passing method 
[12 El [3]. This algorithm is summarized as follows. 

1. Initialization: 

For all (j, i) <E E, the message from % to j is a 



vector m9j ^ £ M 2 . Initialize as 



(2) 



2. Message Passing: 

For each t = 0, 1, . . ., update the messages by 

m ltM X ^ =u) X^fo' 2 *) II m \i,k){ x i), 

XiZXi k<=N(i)\{j} 

until it converges. Finally we obtain { m *(ji)}- 

3. Approximated marginals and the partition func- 
tion are computed by the following formulas: 

H x i) ■= ^ II m *i,j)( x i)> ( 4 ) 

jeN(i) 

bji (xj ,Xi):= ujtpji (xj ,Xi) Y[ m U,k) ( x j ) 

keN(j)\{i} 

n m h,k>)( x i)i ( 5 ) 

k'eN(i)\{j} 

\ogZ B := ^ ^ bjiixjixJlagipjiixjfXi) 

jieE Xj ,Xi 

- E /< fyi ( x j , Xj) log bjj (xj , Xj) 

jieE XjXi 

+ y^( d i ~ ^^biix^logbiixi), 

iev xi 

where u> are appropriate normalization constants, 
bi are called beliefs, and Zb is called the Bethe 
approximation of the partition function. 



In step 2, there is ambiguity as to the order of up- 
dating the messages. We do not specify the order, 
because the fixed points of LBP algorithm do not de- 
pend on its choice. 

We normalize as J2 Xi ,x 3 bji(xj,Xi) = J2 Xl h( x i) 
I . so that the relation ^2 X . bji(xj,Xi) = bj(xj) is al- 
ways satisfied for all ij e E. 

3 Derivation of the Loop Series 
Expansion 

In this section, we prove the loop series expansion 
formula of the partition function and marginals. 

3.1 Expansion of partition functions 

First, we prove the following identity which plays a 
key role in our derivation. We define a set of polyno- 
mials {/ n (a;)}^o inductively by the relations fo[x) — 
l,fi(x) = and f„+i(x) = xf n (x) + f n -i(x). This 
polynomials are transformations of the Chebyshev poly- 
nomials of the second kind. 

Theorem 1. Let {£i}iev o,nd {ftij}ijeE be sets of free 
variables associated to nodes and edges. Then, 

E li i wsher*^ ) ii -£U 

x u ...,x N =±lijeE ieV ^ ' 

sCEijes ieV 

(6) 

where di(s) is the degree of node i in the subgraph 
induced by an edge set s. 



Proof. 

(L.H.S) = e e n x ^n^ x % xi ii 

{xi}sCEijes iev 



Ell ' II E (-*««r*)* w -^ < 

seE ijes iev Xi=±\ 

( _ 6) _ di(s)+1+ ^ l(s )-i 



En^n 

sCE ijes ieV 



6 + er 1 

On the other hand, by the definition of /„ 

£n— l _ rg\— n+i 



Ma - r 1 ) 



(7) 



□ 



Secondly, we give a relation between the true par- 
tition function and the Bethe approximation. 



Lemma 1. 

Z 



z - V—- TT bij^Xi, Xj) 

~Zb~ ~ f~> 11 h(x t )b 3 (x 3 ) 
{xi} ijeE J J 



\\hixi) (8) 

iev 



Proof. If we write the normalization terms explicitly, 
(H]) and ([5]) are rewritten as 

b i (x i )=cT 1 Y[ m \t,j)( x i)' 

jeN(i) 

bji(xj,Xi) = cj^jiixj , x^Ylmlj^ixj) Y[m* {ik , ) (x l ) 
keN( 3 )\{i} k'eN(i)\{j} 

(9) 

By the definition of the Bethe approximation of par- 
tition function, it is easy to see that 



ijeE 1 J ieV 



(10) 



Then, we use ([§]) again, the right hand side of (JSJ is 
equal to -J^-. □ 

Finally, we give the loop series expansion formula 
of the partition function [TJ [7J [5] • 



Theorem 2. Let 



6,(1) - bj(-l) 



(11) 



and 



bjjjl, l)fe <3 -(-l, -1) - bjjjl, 1) 
VMl)M-l)v^(l)M-l) 



Then, the following formula holds. 



Z = z B ^2 r ( s ), 

sC-E 



(12) 

(13) 
(14) 



Before accomplishing the proof, let us consider the 
meaning of the theorem. Equation (fTTjl states that 7, 
is related to the bias of the approximated marginal 
bi(xi), and 7, = if and only if = h(— 1). Equa- 
tion (fT2"j) states that fcj is related to the correlation 
by bij. It is easy to see that < 1 and /3y = if 
and only if bij(Xi,Xj) = bi(xi)bj(xj). 

The definitions and properties of these quantities 
f)ij and 7i, in the context of message passing proce- 
dures, are found in [T]. 

In (TT3"|) , the summation runs over all subsets of E 
including s = </>. As fi{x) = 0, s makes a contribution 
to the sum only if s does not have a node of degree one 
in s. Such a subgraph s is called a generalized loop 
[7J [5] or a closed graph [TH (T3] . In the case that G is a 
tree, there is no generalized loops, therefore Z = Zb- 
This is an alternative proof of the well-known fact [9] ■ 

If [3ij an 7j are sufficiently small, the first term 
r(4>) = 1 is mainly contribute to the sum, and Zb is 
close to Z . 



Proof of theorem® For a given bij(xi, Xj) which sat- 
isfies the normalization condition J2 Xi x- bij{xi,Xj) = 
1, we can always choose & , £j , $ ■ to satisfy 

(#i , ) = — 1( , ], (£, * £j 3 + ftj x i x j ) ■ 

1st + Si As.?' + ?j ) 



From (fT2|) , we see that /3- = /3y . Using 



b%{Xi) ^ ^ bij{xi, Xj*) 



6 + 

~ J 

and ([TT]) , we have ji = & — £r 1 . Notice that 



(15) 
(16) 



bi(xi)bj(xj) 



(17) 



the left hand side of © is equal to the right hand side 
of © and the assertion is proved. □ 



Example 1 



Figure 1: Original graph 
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Figure 2: List of generalized loops 

We give an example of the formula given in theo- 
rem 2. Let G be the graph shown in figure 1. In this 
case, there are five generalized loops, and the expan- 
sion formula becomes 

Z 

+/?12/323/3l3/334/345/?56/3467374- 

(18) 

3.2 Expansion of marginals 

In this subsection, we head for proving theorem 3. We 
define a set of polynomials {9n{ x )}^Lo inductively by 
the relations go(x) — x,gx(x) = —2 and g n +i(x) = 
xg n {x) + g n _i(x). This set of polynomials is a trans- 
formation of the Chebyshev polynomials of the first 
kind. We introduce the following lemma which is a 
modification of theorem 1. 



Lemma 2. 



xi,...,x N =±l ij£E 



Ell ; II 



iev 



sCEijes 



iev\{i} 



Proof. Check that 



Xl=±l 



(19) 



(20) 



□ 



Theorem 3. [T] Let pi(xi) be the true marginal dis- 
tribution defined by the joint probability distribution 
(QP. Then, 

(21) 

Proof. The key fact is 

-Z-fa (i)- Pl (-i)) = E ^ II rr%f \ n 



which is a modification of lemma 1 . Follow the proof 
of theorem 2. □ 

A problem of finding an assignment that maxi- 
mize the marginal probability p\ is called maximum 
marginal assignment problem. This problem is espe- 
cially important in the application of error correcting 
code [4j [5] - It is known that if the graph has one cycle 
and the concerning node 1 S V is on the unique cycle, 
then the assignment that maximize the belief b\ gives 
the exact assignment [TO]. With theorem 3, this fact 
is easily deduced as follows [TJ. 

Corollary 1. Let G be a graph with a single cycle 
with the node 1 on it. See figure^ for example. Then, 
Pi(l)— pi(— 1) and bi(l) — &i(— 1) have the same sign. 

Proof. In the right hand side of (2lj) . only two sub- 
graphs s are contribute to the sum. From 50(71) = 7i 
j 52(71) = —71 and \(3ij\ < 1, we see that the sum is 
positively proportional to 71 . □ 



Figure 3: Graph with a single cycle 
We append a comment on theorem 3. In the sum 
of (|2~Tj) , the contribution of s — 4> is equal to 50(71) = 
71: this is proportional to 61 (+) — 61 ( — 1) . The assign- 
ment that maximize the belief b\ can be regarded as 
the Bethe approximation in the view point of theorem 
3. 



4 Bound on the number of gen- 
eralized loops 



In the loop series expansion formula (|13p . the sum- 
mation runs over all generalized loops. To know the 
computational cost for summing up the terms, the 
number of generalized loops is of interest. 

Definition 1. 

- £0'"' II -r 1 )- (22) 

sC-E iev 

This is a bivariate (Laurent) polynomial with re- 
spect to f3 and £. The coefficients are all integers. 

Let n(G) := \E\ — \ V\ +1 be the number of linearly 
independent cycles. The next lemma shows that the 
value of 9g on /3 = 1 is determined by n(G). 



Lemma 3. 



n(G) 



MU)= E 



fe=0 



n(G) 
k 



hkit-C 1 ) (23) 



Proof. The left hand side of theorem 1 gives an al- 
ternative representation of Oq. If Xi 7^ Xj, then 1 + 
XiXj(3^~ Xi ^~ Xj = 0. As we assumed the graph G is 
connected, only two terms of X\ = ■ ■ ■ = xn = 1 and 
x\ = •■■ = xn = —1 contribute to the sum. There- 
fore, 



+(i+a |s| (^r) |v| - 

From ([7]) , the right hand side of (|2"3"|) is equal to 



(24) 



□ 



If f = i^E, then £ - = 1. From (23]) or ([24]). 



we see that 




This fact can be used to bound the number of gener- 
alized loops [lj. 

Theorem 4. Let Qq be the set of all generalized loops 
of G including empty set. Then, 




This bound is attained if and only if every node of a 
generalized loop has the degree at most three. 



Proof. If we set [3 = 1 and £ = ^h^, 



/(I 



1 + \/5^ 



= E r ^ 

seGo 



(26) 



where r(s) = H ieV / di ( s )(l). As /„(1) > 1 for all 
n > 4 and /2(1) = /3(1) = 1, we have r(s) > 1 for all 
s & Go , and the equality holds if and only if di(C) < 3 

for all i e V. This shows \Go\ < 9(1, and the 

equality condition. □ 

5 Generalization to factor graph 
model 

In this section we briefly introduce the factor graph 
model, which is more general than the pairwise model 
in section 1 . We generalize the result of theorem [2] 
Theorem [3] is also generalized straightforwardly. 

5.1 Factor graph model 

Let H := (V, F) be a hypergraph, that is, V = {1, ... , N} 
is a set of nodes and F C 2 V is a set of hyperedges. 
A hypergraph H is represented by a bipartite graph 
Gh = (Vh,Eh). Each type of node corresponds to 
elements of V and F; the first type is called vari- 
able node and the second type is called factor node. 
For example, see figure 4. In this example, the hy- 
pergraph H = (V, F) is given by V = {1,2,3} and 
F = {A!,A 2 ,A 3 } ,where Ai = {1,2}, A 2 = {1,2,3} 
and A 3 = {2}. 



and 

b\{x\) := Loipx(xx) Ll m *\,j)( x \)- (29) 

jeA 

The Bcthc approximation of the partition function 
is given by 



logZ B := ^2^2b x (xx)logtpx{x x ) 

AGF x x 

- ^2^2b x (x x )logbx(xx) 

AeF x x 

+ y^X<jj - l)^2bi(xi) log bi(xi). 



Figure 4: Example of Gh 



The joint probability distribution is given by the 
following form: 



(27) 



xeF 



where xx = {xi}i e x- If each hyperedge A 6 F consists 
of a pair of nodes, this class of probability distribu- 
tions reduces to the pairwise MRF model. 

5.2 Loopy belief propagation algorithm 

For a node i £ V and a hyperedge A £ F which satisfy 
i € A, messages x)( x i) an< ^ m (\.i)(xx) are defined 
and updated by the following rules: 

m \tx) ( x i) = u E ^a(^a) n m l\,j)( x y)' 

x \\{i} 36A,i^i 
m (A,i)( X A) = W Yl m li,n)( X i)- 



Beliefs are defined by 



(28) 



5.3 Expansion of partition functions 

To state factor graph version of theorem [21 we need 
a little complicated notations. For each hyperedge 
XeF and 7 C A, we introduce a variable f3j. We use 
convention that = 1 and /3 A = if |J| = 1. 

Theorem 1 is modified to the following identity: 

E II E tfte.C 11 ) • • • KC* ) II rfpr 



(i.) AeF JcA 

= E II(- 1 ) |/xWl ^wII/*«(6-C 1 ), 

(30) 



sCE H AeF 



where 7 = {ii, . . . ,ik} and I\(s) is a set of variable 
nodes which connect to A by edges in s. 
Lemma 1 is modified to 

{x 4 } ASF ^ieA iey 

Theorem 5. For a factor graph model on H , we have 
Z = Z B E K s ) (32) 

sCFif 

r(*) := (-l) |s| n^( S )IT/^)(7«)- 

AGF iev 

Sketch of proof. For a given bx(x\), we can choose 
{ti}ie\ and {/9/ }/ c a,|/|>2 to satisfy 

m*a)= ( /, rl) E^n" n c- 



The definition of 7^ is the same as ([TTj) . 



□ 



In the summation ()32|) . only generalized loops in 
Gh contribute to the sum. Therefore, this expansion 
is again called loop series expansion. 



5.4 Expansion of marginals 



Theorem 8. 



In the same way as section 3.2 we have the following 
theorem 

Theorem 6. 

Z Pl (+1) - Pl (-l) 



Zb v/6i(l)hi(-l) 



sCE, 



i \eF iev\{i} 



(33) 



6 Properties of Oq 

In the rest of this article, we will focus on the (Lau- 
rent) polynomial of Oq. The accuracy of the Bethe 
approximation depends both on the graph topology 
and strength of the interactions; the formula in theo- 
rem 2 displays this fact. To exploit the graph topology 
for the analysis of the performance of the LBP algo- 
rithm, we need sophisticated techniques. One of the 
techniques is graph polynomials. Graph polynomials 
have long history since Birkhoff introduced the chro- 
matic polynomial 14] and Tutte generalized it to the 
Tutte polynomial [15j . 

6.1 Contraction-Deletion relation 

In this subsection we explain that the function Oq 
admits the contraction-deletion relation. If a func- 
tion from graph satisfies the contraction-deletion rela- 
tion and multiplicativity for disjoint unions of graphs, 
such a function is called the Tutte's V- function [15] . 
Though the Tutte polynomial is the most famous ex- 
ample of the V-functions, Oq is another interesting 
example of the V-functions. 

The graph which is obtained by deleting an edge 
e G E is denoted by G\e. The graph which is obtained 
by contracting e is denoted by G/e. In this article, 
the operations of the contraction and the deletion are 
only applied to the non loop edges. Note that an edge 
e G E is called a loop if both ends of e are connected 
to the same node. 

The following formula of f n (x) is essential in the 
proof of the contraction-deletion relation. 

Lemma 4. v n, rn G N 

fn+m-2(x) = fn(x)f m {x) + /„_i (x)/ m _i (x) 

Theorem 7. For a non loop edge e G E 

6 G (f3, = (1 - 0)6a\e(P, + Wo/eiP, £) (34) 



Sketch of proof. Classify s in (|22| if s include e or not. 
Then apply lemma [4] for s 3 e. □ 

6.2 The case of f = v^T 

At the point of £ = Oq has special and inter- 

esting properties. From pi]). 6> G (1,V-T) = 0. The 
following theorem asserts that 6(f), \/— 1) can be di- 
vided by (1 - 0) at \E\ - \V\ times. 



woG9) := 



G Z\(3] 



(35) 



(\-P)\E\-W\ 

Proof. Theorem [7] and definition of luq imply that 

For a bouquet graph E>l, which has a single node and 
L loops, we can easily check that 



u BL ([3) = l + (2L-l)f3. 



(37) 



We can show the assertion inductively by (|36|) and 
137]). □ 



1 has a combinatorial 



The value of uiq at (3 
interpretation. 



Theorem 9. If the graph G does not have loop edge, 

uj g {\) = :V -> E;i/j satisfies (cl), (c2)}. 

The condition (cl) is that %j) is injective and (c2) is 
that for all i G V is one of a connecting edges to 
i. 

We omit the proof of this theorem. The assump- 
tion that the graph is loop-less is not essential. 

6.3 Relation between uq and matching 
polynomials 

The polynomial uig{0), introduced in the previous 
section, is closely related to the matching polynomial. 

A matching of G is a set of edges in which any 
edges does not occupy a same node. If a matching 
consists of k edges, it is called k-matching. Let P Q(k) 
be the number of k-matchings of G. The matching 
polynomial olq is defined by 



[n/2] 



-2k 



(38) 



fc=0 



We introduce two square matrices indexed by V. 
Let A be a adjacency matrix of the graph G and let 
I? be a diagonal matrix called degree matrix. In other 
words, for a function (j) : V — * R, 

(A<f>)(i) = ^(J). WHO = de §M (39) 



Theorem 10. 

WG (u 2 )=^2 

C:cycles 



k( - c) det[I + u 2 (V-I)-uA] " |c| 



G\C 



(40) 

where ■ | ~ c denotes a restriction to the principal mi- 
nor of G \ C . The summation runs over all node 
disjoint cycles, i.e. 2-regular edge induced subgraphs. 
The number of connected components of C is denoted 
byk(C) 



We omit the proof of theorem [TU1 The following 
corollary shows that to G is a matching polynomial if 
G is a regular graph. 

Corollary 2. If G is a (q + l)-regular graph, then 

co G (u 2 ) = a G (l/u + qu)u n . (41) 
Proof. In [16j . it is shown that 

a G (x)= 2 k(C) det[xl - A G \c}. (42) 

C:cycles 

If G is a (g + l)-regular graph, then I? = (g + 1)7. 
From theorem [TU] and (|4"2")l . the assertion follows. □ 
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